18 research outputs found

    synder: inferring genomic orthologs from synteny maps

    Get PDF
    Ortholog inference is a key step in understanding the evolution and function of a gene or other genomic feature. Yet often no similar sequence can be identified, or the true ortholog is hidden among false positives. A solution is to consider the sequence\u27s genomic context. We present the generic program, synder, for tracing features of interest between genomes based on a synteny map. This approach narrows genomic search-space independently of the sequence of the feature of interest. We illustrate the utility of synder by finding orthologs for the Arabidopsis thaliana 13-member gene family of Nuclear Factor YC transcription factor across the Brassicaceae clade

    System-wide transcriptome damage and tissue identity loss in COVID-19 patients

    Get PDF
    The molecular mechanisms underlying the clinical manifestations of coronavirus disease 2019 (COVID-19), and what distinguishes them from common seasonal influenza virus and other lung injury states such as acute respiratory distress syndrome, remain poorly understood. To address these challenges, we combine transcriptional profiling of 646 clinical nasopharyngeal swabs and 39 patient autopsy tissues to define body-wide transcriptome changes in response to COVID-19. We then match these data with spatial protein and expression profiling across 357 tissue sections from 16 representative patient lung samples and identify tissue-compartment-specific damage wrought by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection, evident as a function of varying viral loads during the clinical course of infection and tissue-type-specific expression states. Overall, our findings reveal a systemic disruption of canonical cellular and transcriptional pathways across all tissues, which can inform subsequent studies to combat the mortality of COVID-19 and to better understand the molecular dynamics of lethal SARS-CoV-2 and other respiratory infections., • Across all organs, fibroblast, and immune cell populations increase in COVID-19 patients • Organ-specific cell types and functional markers are lost in all COVID-19 tissue types • Lung compartment identity loss correlates with SARS-CoV-2 viral loads • COVID-19 uniquely disrupts co-occurrence cell type clusters (different from IAV/ARDS) , Park et al. report system-wide transcriptome damage and tissue identity loss wrought by SARS-CoV-2, influenza, and bacterial infection across multiple organs (heart, liver, lung, kidney, and lymph nodes) and provide a spatiotemporal landscape of COVID-19 in the lung

    Pan-tissue pan-cancer characterization of novel human orphan genes via analysis of RNA-Sequencing data

    No full text
    The recently emerged, young orphan genes provide an organism with a cadre of unique species-specific proteins. Since first described 25 years ago, several functionally important orphan genes from diverse species have been characterized. However, there remain significant lacunae in our knowledge about the origins and functions of orphan genes. In a bid to decipher the "dark transcriptome", a result of pervasive transcription, researchers are exploring high-throughput sequencing-based gene annotation methods. Recent studies have made efforts to compile a comprehensive human transcriptome using data from experimental approaches such as RNA-Seq, Ribo-Seq, and proteomics. A number of these studies continue to ignore orphan genes because of their non-canonical features, such as short length, and lack of introns. There is a growing interest to catalog the unannotated novel transcripts and ORFs and understand their roles in the context of human physiology and diseases like cancer. These novel transcripts include unannotated genes, small Open Reading Frames (smORFs) of < 100 codons, novel ORFs encoded by lncRNAs and other non-coding RNAs, and other regulatory non-coding RNAs. This dissertation presents methods and tools to efficiently process, and analyze large RNA-Seq datasets for the purpose of identifying and characterizing orphans and other yet unannotated genes. First, I developed MetaOmGraph a Java tool for interactive exploratory analysis of large expression datasets. MetaOmGraph provides an easy framework to explore expression patterns of genes and transcripts and build hypotheses about their functional roles. Next, I developed orfipy a fast and flexible Open Reading Frame (ORF) finder for quick and accurate annotation of Coding Sequences (CDS) in large transcriptomic datasets. Third, I present pyrpipe, a python package for straightforward and reproducible analysis of RNA-Seq datasets. Using these tools and methods, a reproducible and scalable pipeline for annotating the human "dark transcriptome" is proposed. Leveraging terabytes of tumor and non-diseased RNA-Seq data, the pipeline identified thousands of tissue- and tumor-specific transcripts coding for novel peptides. Phylostratigraphy and synteny analysis revealed the majority of novel genes are orphans encoding a human-specific protein. The expression and translation status of these novel transcripts are validated using independent RNA-Seq and Ribo-seq data. Pan-cancer analysis of these novel genes reveals their differential expression and association with overall patient survival suggesting their potential to be utilized for novel diagnostic and therapeutic interventions

    African Americans and European Americans exhibit distinct gene expression patterns across tissues and tumors associated with immunologic functions and environmental exposures

    Get PDF
    The COVID-19 pandemic has affected African American populations disproportionately with respect to prevalence, and mortality. Expression profiles represent snapshots of combined genetic, socio-environmental (including socioeconomic and environmental factors), and physiological effects on the molecular phenotype. As such, they have potential to improve biological understanding of differences among populations, and provide therapeutic biomarkers and environmental mitigation strategies. Here, we undertook a large-scale assessment of patterns of gene expression between African Americans and European Americans, mining RNA-Seq data from 25 non-diseased and diseased (tumor) tissue-types. We observed the widespread enrichment of pathways implicated in COVID-19 and integral to inflammation and reactive oxygen stress. Chemokine CCL3L3 expression is up-regulated in African Americans. GSTM1, encoding a glutathione S-transferase that metabolizes reactive oxygen species and xenobiotics, is upregulated. The little-studied F8A2 gene is up to 40-fold more highly expressed in African Americans; F8A2 encodes HAP40 protein, which mediates endosome movement, potentially altering the cellular response to SARS-CoV-2. African American expression signatures, superimposed on single cell-RNA reference data, reveal increased number or activity of esophageal glandular cells and lung ACE2-positive basal keratinocytes. Our findings establish basal prognostic signatures that can be used to refine approaches to minimize risk of severe infection and improve precision treatment of COVID-19 for African Americans. To enable dissection of causes of divergent molecular phenotypes, we advocate routine inclusion of metadata on genomic and socio-environmental factors for human RNA-sequencing studies.This article is published as Singh, U., Hernandez, K.M., Aronow, B.J. et al. African Americans and European Americans exhibit distinct gene expression patterns across tissues and tumors associated with immunologic functions and environmental exposures. Sci Rep 11, 9905 (2021). doi:10.1038/s41598-021-89224-1.</p

    Accuracy of functional gene community detection in Saccharomyces cerevisiae by maximizing Generalized Modularity Density

    Get PDF
    Identifying functionally-cohesive gene communities from large data sets of expression data for individual genes is a key approach to understanding the molecular components of biological processes. Here, we compare the accuracy of twelve different approaches to infer gene co-expression networks and then find gene communities within the networks. Among the approaches used are ones involving a recently developed clustering method that identifies communities by maximizing Generalized Modularity Density (Qg). RNA-Seq data from 691 samples of S. cerevisiae (yeast) are analyzed. These data have been obtained from organisms grown under diverse environmental and developmental conditions and encompass varied mutant lines. To assess the accuracy of different approaches, we introduce a statistical measure, the Average Adjusted Rand Index (AARI) score, which compares their results to Gene Ontology (GO) term associations. Inferring gene networks using the Context Likelihood of Relatedness (CLR) and subsequently clustering by maximizing Generalized Modularity Density is found to identify the most significant functional communities. Also, to quantify the extent to which the identified communities are biologically relevant, a GO term enrichment analysis is performed. The results indicate that many of the communities found by maximizing Generalized Modularity Density are enriched in genes with known biological functions. Furthermore, some of the communities contain genes of unknown function, enabling inference of potentially novel functional interactions involving these genes. Furthermore, some genes are species-specific orphan genes; assignment of these orphan genes to communities enriched in a particular biological process provides a method to infer the biological process in which they are involved. We focus on a few communities that are highly significantly enriched in a particular biological process, and develop experimentally-testable predictions about the orphan genes in these communities.This preprint is made available through bioRxiv at doi:10.1101/2022.12.28.522153
    corecore